We're at the border of going from agent designs without learning to agent designs with learning.
And yesterday we kind of culminated our study of reasoning with uncertainty or deciding
with uncertainty, which is what realistic real world agents have to do with a very brief
introduction to POMDPs.
Partially observable Markov decision procedures.
Markov decision procedures, you remember, are these environments where we have fully
observable or deterministic sensors, which makes the environment fully observable, but
non-deterministic actions.
You do something, but that can go wrong.
POMDPs add non-reliable sensors to that mix.
And that's essentially everything that can go wrong.
So in our example, our little 11 example, we added an unreliable sensor, which is basically
just introducing an error into our considerations.
And there's only one thing to remember about these things is you can do POMDPs by doing
MDP at not the level of states, but the level of belief states.
Very simple realization that has a lot of consequences that are both good and bad.
One of the good things is that we're essentially getting exactly what we need, namely the belief
state is fully observable.
We know what we believe as an agent.
We can observe our own belief state.
And so we can actually compute on that.
We can't get below the belief state because we know nothing, in principle, about the actual
state.
We can only have approximations or beliefs over the set of possible states.
So the good thing is we can, in principle, do MDP algorithms, one level up.
The bad news is that we only know MDP algorithms that can deal with discrete spaces, and the
belief space isn't.
So even though we have this wonderful realization that we don't have to learn anything new,
that turns out to be false because we didn't learn enough earlier.
If we had had the theory, the full theory of MDP in continuous spaces, which we didn't
because you need much more math foundations for that, then we could apply that.
But these AI lectures are just an introduction, and the literature is out there, so you can
actually read up on this if you're interested.
The other thing to realize is that the real world is indeed a POMDP.
Whatever we do, we don't have reliable actions.
I've ever tried to shoot a hoop in a basketball.
Typical thing, at least for me.
I plan the best.
I know exactly where the ball should go, only it doesn't.
And of course, we have, in practice, limited accuracies on all of our sensors.
Sometimes that matters, sometimes that doesn't matter.
So in a way, the conclusion from the theoretical part was that the real world is a POMDP, meaning
we have the theory, too.
We can describe things, but all algorithms are pretty terrible in their complexity.
And everything grows tremendously as our models get finer and finer.
Presenters
Zugänglich über
Offener Zugang
Dauer
00:05:16 Min
Aufnahmedatum
2021-03-30
Hochgeladen am
2021-03-31 11:06:32
Sprache
en-US
Recap: Partially Observable MDPs
Main video on the topic in chapter 7 clip 5.